Biostatistics For Dummies, 2nd Edition (Monika Wahi, John Pezzullo)

342 PART 6 Analyzing Survival Data

Assessing goodness-of-fit and predictive

ability of the model

There are several measures of how well a regression model fits the survival data.

These measures can be useful when you’re choosing among several different

models:»

» ^{Should you include a possible predictor variable (like}^age^{) in the model?»}

» ^{Should you include the squares or cubes of predictor variables in the model}

(meaning including age² or age³ in addition to age)?»

» ^{Should you include a term for the interaction between two predictors?}

Your software may offer one or more of the following goodness-of-fit measures:»

» ^{A measure of agreement between the observed and predicted outcomes}

called concordance (see the bottom of Figure 23-4). Concordance indicates the

extent to which participants with higher predicted hazard values had shorter

observed survival times, which is what you’d expect. Figure 23-4 shows a

concordance of 0.642 for this regression.»

» ^An^r^(or^r²^{) value that’s interpreted like a correlation coefficient in ordinary}

regression, meaning the larger the r² value, the better the model fits the data.

In Figure 23-4, r² (labeled Rsquare) is 0.116.»

» ^{A likelihood ratio test and associated p value that compares the full model,}

which includes all the parameters, to a model consisting of just the overall

baseline function. In Figure 23-4, the likelihood ratio p value is shown as

4 46

, which is scientific notation for p 0.00000446, indicating a model

that includes the CenterCD and Radiation variables can predict survival

statistically significantly better than just the overall (baseline) survival curve.»

» ^{Akaike’s Information Criterion}^{(AIC) is especially useful for comparing alternative}

models but is not included in Figure 23-4.

Focusing on baseline survival

and hazard functions

The baseline survival function is represented as a table with two columns — time

and predicted survival — and a row for each distinct time at which one or more

events were observed.